Düsseldorf
- Asia > Middle East > Saudi Arabia (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- North America > Canada (0.04)
- (5 more...)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Government (1.00)
- Law (0.67)
- North America > Canada > Alberta (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (5 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Law (0.67)
MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev
Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.
- North America > Canada > Alberta (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Czechia (0.04)
- (15 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government > Regional Government (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- (4 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Idaho (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- (2 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- (2 more...)
Automatic debiased machine learning and sensitivity analysis for sample selection models
Bjelac, Jakob, Chernozhukov, Victor, Klotz, Phil-Adrian, Kueck, Jannis, Schmitz, Theresa M. A.
In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.05)
- Europe > Czechia > Prague (0.04)
- Europe > Slovakia (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.88)
Understanding Syntactic Generalization in Structure-inducing Language Models
Arps, David, Sajjad, Hassan, Kallmeyer, Laura
Structure-inducing Language Models (SiLM) are trained on a self-supervised language modeling task, and induce a hierarchical sentence representation as a byproduct when processing an input. SiLMs couple strong syntactic generalization behavior with competitive performance on various NLP tasks, but many of their basic properties are yet underexplored. In this work, we train three different SiLM architectures from scratch: Structformer (Shen et al., 2021), UDGN (Shen et al., 2022), and GPST (Hu et al., 2024b). We train these architectures on both natural language (English, German, and Chinese) corpora and synthetic bracketing expressions. The models are then evaluated with respect to (i) properties of the induced syntactic representations (ii) performance on grammaticality judgment tasks, and (iii) training dynamics. We find that none of the three architectures dominates across all evaluation metrics. However, there are significant differences, in particular with respect to the induced syntactic representations. The Generative Pretrained Structured Transformer (GPST; Hu et al. 2024) performs most consistently across evaluation settings, and outperforms the other models on long-distance dependencies in bracketing expressions. Furthermore, our study shows that small models trained on large amounts of synthetic data provide a useful testbed for evaluating basic model properties.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- (25 more...)
AuditCopilot: Leveraging LLMs for Fraud Detection in Double-Entry Bookkeeping
Kadir, Md Abdul, Vasu, Sai Suresh Macharla, Nair, Sidharth S., Sonntag, Daniel
Auditors rely on Journal Entry Tests (JETs) to detect anomalies in tax-related ledger records, but rule-based methods generate overwhelming false positives and struggle with subtle irregularities. We investigate whether large language models (LLMs) can serve as anomaly detectors in double-entry bookkeeping. Benchmarking SoTA LLMs such as LLaMA and Gemma on both synthetic and real-world anonymized ledgers, we compare them against JETs and machine learning baselines. Our results show that LLMs consistently outperform traditional rule-based JETs and classical ML baselines, while also providing natural-language explanations that enhance interpretability. These results highlight the potential of \textbf{AI-augmented auditing}, where human auditors collaborate with foundation models to strengthen financial integrity.
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- Europe > Germany > Lower Saxony > Oldenburg (0.04)
- Europe > France (0.04)
- Law (0.47)
- Banking & Finance (0.47)
- Law Enforcement & Public Safety > Fraud (0.41)